Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data.

نویسندگان

  • Yanming Di
  • Sarah C Emerson
  • Daniel W Schafer
  • Jeffrey A Kimbrel
  • Jeff H Chang
چکیده

RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of Count Data using Bivariate Negative Binomial Regression Models

Abstract Negative binomial regression model (NBR) is a popular approach for modeling overdispersed count data with covariates. Several parameterizations have been performed for NBR, and the two well-known models, negative binomial-1 regression model (NBR-1) and negative binomial-2 regression model (NBR-2), have been applied. Another parameterization of NBR is negative binomial-P regression mode...

متن کامل

Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small n...

متن کامل

Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer?

INTRODUCTION Count data are often collected in chronic disease research, and sometimes these data have a skewed distribution. The number of unhealthy days reported in the Behavioral Risk Factor Surveillance System (BRFSS) is an example of such data: most respondents report zero days. Studies have either categorized the Healthy Days measure or used linear regression models. We used alternative r...

متن کامل

Comparison between Efficiency of Poisson Regression Model and Negative Binomial Regression in the Analysis of Factors Affecting Mortality from Cardiovascular Diseases in Yazd Province in 2017

      Introduction: Despite the advances in cardiovascular diseases, death caused by these diseases is still considered as the leading cause of mortality. In this study, some of the effective factors on the deaths caused by cardiovascular diseases were investigated Methods: This cross-sectional analytical study investigated the efficacy of Poisson regression models and negative binomial regres...

متن کامل

Exploring the use of negative binomial regression modeling for pediatric peripheral intravenous catheterization

A large study conducted at two southeastern US hospitals from October 2007 through October 2008 sought to identify predictive variables for successful intravenous catheter (IV) insertion, a crucial procedure that is potentially difficult and time consuming in young children. The data was collected on a sample of 592 children that received a total of 1,195 attempts to start peripheral IV cathete...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 12 1  شماره 

صفحات  -

تاریخ انتشار 2013